Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance
نویسندگان
چکیده
This paper presents our extractive summarization systems at the update summarization track of TAC 2009. This system is based on our newly developed document summarization framework under the theory of conditional information distance among many objects. The best summary is defined in this paper to be the one which has the minimum information distance to the entire document set. The best update summary has the minimum conditional information distance to a document cluster given that a prior document cluster has already been read. Experiments on the TAC dataset have proved that our method has got a good performance in many categories.
منابع مشابه
Tsinghua University at the Summarization Track of TAC 2008
This paper presents our extractive summarization systems at the update summarization track of TAC 2008. We proposed two novel methods, one is based on the information distance theory, and the other is based on the sentence centrality which derives from the centrality concept in the graph theory. The evaluation results show that the two submitted runs are very competitive to generate extractive ...
متن کاملAutomatic Summarization from Multiple Documents (Extended Abstract)
Since the late 50’s and Luhn [Luh58] the information community has expressed its interest in summarizing texts. The domains of application of such methodologies are countless, ranging from news summarization [WL03, BM05, ROWBG05] to scientific article summarization [TM02] and meeting summarization [NPDP05, ELH03]. Summarization has been defined as a reductive transformation of a given set of te...
متن کاملSummarizing with Encyclopedic Knowledge
This paper presents a topic-driven multidocument summarization approach that relies on linking documents to Wikipedia. Wikipedia provides structural support to retrieve relevant concepts from the documents to be summarized, and quantify the strength of the relations between them, thus expanding the topic. We identify concepts in the documents, and assign them scores that describe their relevanc...
متن کاملDualSum: a Topic-Model based approach for update summarization
Update summarization is a new challenge in multi-document summarization focusing on summarizing a set of recent documents relatively to another set of earlier documents. We present an unsupervised probabilistic approach to model novelty in a document collection and apply it to the generation of update summaries. The new model, called DUALSUM, results in the second or third position in terms of ...
متن کاملICTCAS's ICTGrasper at TAC 2008: Summarizing Dynamic Information with Signature Terms Based Content Filtering
This paper presents our new, topic-oriented multi-document summarization system used in TAC 2008. To deal with the problem of summarizing changes of the dynamic information with time going, we propose a novel summarization method with signature terms based content filtering. We first present the definition of dynamic summarization according to temporal analysis and then propose the fundamental ...
متن کامل